Overview

Dataset statistics

Number of variables16
Number of observations355654
Missing cells876771
Missing cells (%)15.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory46.1 MiB
Average record size in memory136.0 B

Variable types

Categorical9
Numeric7

Alerts

Open has constant value "1.0" Constant
Date has a high cardinality: 396 distinct values High cardinality
StateHoliday is highly correlated with OpenHigh correlation
Promo is highly correlated with OpenHigh correlation
Open is highly correlated with StateHoliday and 6 other fieldsHigh correlation
Assortment is highly correlated with Open and 1 other fieldsHigh correlation
Promo2 is highly correlated with Open and 1 other fieldsHigh correlation
PromoInterval is highly correlated with Open and 1 other fieldsHigh correlation
StoreType is highly correlated with Open and 1 other fieldsHigh correlation
SchoolHoliday is highly correlated with OpenHigh correlation
StoreType is highly correlated with AssortmentHigh correlation
Assortment is highly correlated with StoreTypeHigh correlation
CompetitionOpenSinceYear is highly correlated with Promo2SinceWeekHigh correlation
Promo2SinceWeek is highly correlated with CompetitionOpenSinceYear and 2 other fieldsHigh correlation
Promo2SinceYear is highly correlated with Promo2SinceWeek and 1 other fieldsHigh correlation
PromoInterval is highly correlated with Promo2SinceWeek and 1 other fieldsHigh correlation
DayOfWeek has 10733 (3.0%) missing values Missing
Open has 10710 (3.0%) missing values Missing
Promo has 10639 (3.0%) missing values Missing
StateHoliday has 10851 (3.1%) missing values Missing
SchoolHoliday has 10885 (3.1%) missing values Missing
StoreType has 10743 (3.0%) missing values Missing
Assortment has 10743 (3.0%) missing values Missing
CompetitionDistance has 11658 (3.3%) missing values Missing
CompetitionOpenSinceMonth has 120379 (33.8%) missing values Missing
CompetitionOpenSinceYear has 120379 (33.8%) missing values Missing
Promo2 has 10743 (3.0%) missing values Missing
Promo2SinceWeek has 179436 (50.5%) missing values Missing
Promo2SinceYear has 179436 (50.5%) missing values Missing
PromoInterval has 179436 (50.5%) missing values Missing
Store has 10743 (3.0%) zeros Zeros

Reproduction

Analysis started2021-10-29 23:27:15.205478
Analysis finished2021-10-29 23:27:50.399153
Duration35.19 seconds
Software versionpandas-profiling v3.1.0
Download configurationconfig.json

Variables

Date
Categorical

HIGH CARDINALITY

Distinct396
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size5.4 MiB
2013-11-07
 
1100
2013-12-30
 
1095
2013-09-23
 
1095
2013-10-01
 
1094
2013-12-09
 
1093
Other values (391)
350177 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2013-01-01
2nd row2013-01-01
3rd row2013-01-01
4th row2013-01-01
5th row2013-01-01

Common Values

ValueCountFrequency (%)
2013-11-071100
 
0.3%
2013-12-301095
 
0.3%
2013-09-231095
 
0.3%
2013-10-011094
 
0.3%
2013-12-091093
 
0.3%
2013-09-161092
 
0.3%
2013-12-041092
 
0.3%
2013-08-091092
 
0.3%
2014-01-031090
 
0.3%
2013-08-311090
 
0.3%
Other values (386)344721
96.9%

Length

2021-10-30T01:27:50.507349image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2013-11-071100
 
0.3%
2013-09-231095
 
0.3%
2013-12-301095
 
0.3%
2013-10-011094
 
0.3%
2013-12-091093
 
0.3%
2013-09-161092
 
0.3%
2013-12-041092
 
0.3%
2013-08-091092
 
0.3%
2013-08-311090
 
0.3%
2014-01-031090
 
0.3%
Other values (386)344721
96.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Store
Real number (ℝ≥0)

ZEROS

Distinct1116
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean541.1932524
Minimum0
Maximum1115
Zeros10743
Zeros (%)3.0%
Negative0
Negative (%)0.0%
Memory size5.4 MiB
2021-10-30T01:27:50.663507image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile23
Q1254
median540
Q3828
95-th percentile1058
Maximum1115
Range1115
Interquartile range (IQR)574

Descriptive statistics

Standard deviation330.7948258
Coefficient of variation (CV)0.6112323542
Kurtosis-1.207844824
Mean541.1932524
Median Absolute Deviation (MAD)287
Skewness0.008355969797
Sum192477545
Variance109425.2168
MonotonicityNot monotonic
2021-10-30T01:27:50.827634image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
010743
 
3.0%
494379
 
0.1%
335377
 
0.1%
733377
 
0.1%
769375
 
0.1%
562375
 
0.1%
85374
 
0.1%
262374
 
0.1%
1097370
 
0.1%
948367
 
0.1%
Other values (1106)341543
96.0%
ValueCountFrequency (%)
010743
3.0%
1310
 
0.1%
2313
 
0.1%
3307
 
0.1%
4312
 
0.1%
5314
 
0.1%
6307
 
0.1%
7310
 
0.1%
8305
 
0.1%
9310
 
0.1%
ValueCountFrequency (%)
1115313
0.1%
1114308
0.1%
1113317
0.1%
1112312
0.1%
1111315
0.1%
1110303
0.1%
1109301
0.1%
1108314
0.1%
1107304
0.1%
1106308
0.1%

DayOfWeek
Real number (ℝ≥0)

MISSING

Distinct7
Distinct (%)< 0.1%
Missing10733
Missing (%)3.0%
Infinite0
Infinite (%)0.0%
Mean3.526335016
Minimum1
Maximum7
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.4 MiB
2021-10-30T01:27:50.972925image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q35
95-th percentile6
Maximum7
Range6
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.723776397
Coefficient of variation (CV)0.4888294474
Kurtosis-1.264191727
Mean3.526335016
Median Absolute Deviation (MAD)2
Skewness0.006356282238
Sum1216307
Variance2.971405068
MonotonicityNot monotonic
2021-10-30T01:27:51.082082image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
258613
16.5%
658540
16.5%
557980
16.3%
356463
15.9%
156304
15.8%
455633
15.6%
71388
 
0.4%
(Missing)10733
 
3.0%
ValueCountFrequency (%)
156304
15.8%
258613
16.5%
356463
15.9%
455633
15.6%
557980
16.3%
658540
16.5%
71388
 
0.4%
ValueCountFrequency (%)
71388
 
0.4%
658540
16.5%
557980
16.3%
455633
15.6%
356463
15.9%
258613
16.5%
156304
15.8%

Open
Categorical

CONSTANT
HIGH CORRELATION
MISSING
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing10710
Missing (%)3.0%
Memory size5.4 MiB
1.0
344944 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
1.0344944
97.0%
(Missing)10710
 
3.0%

Length

2021-10-30T01:27:51.214405image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-30T01:27:51.298977image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
1.0344944
100.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Promo
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing10639
Missing (%)3.0%
Memory size5.4 MiB
0.0
196560 
1.0
148455 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
0.0196560
55.3%
1.0148455
41.7%
(Missing)10639
 
3.0%

Length

2021-10-30T01:27:51.379822image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-30T01:27:51.473931image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0.0196560
57.0%
1.0148455
43.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

StateHoliday
Categorical

HIGH CORRELATION
MISSING

Distinct4
Distinct (%)< 0.1%
Missing10851
Missing (%)3.1%
Memory size5.4 MiB
0
344465 
a
 
257
b
 
46
c
 
35

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowa
2nd rowa
3rd rowa
4th rowa
5th rowa

Common Values

ValueCountFrequency (%)
0344465
96.9%
a257
 
0.1%
b46
 
< 0.1%
c35
 
< 0.1%
(Missing)10851
 
3.1%

Length

2021-10-30T01:27:51.564828image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-30T01:27:51.644468image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0344465
99.9%
a257
 
0.1%
b46
 
< 0.1%
c35
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

SchoolHoliday
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing10885
Missing (%)3.1%
Memory size5.4 MiB
0.0
276262 
1.0
68507 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row1.0
5th row1.0

Common Values

ValueCountFrequency (%)
0.0276262
77.7%
1.068507
 
19.3%
(Missing)10885
 
3.1%

Length

2021-10-30T01:27:51.735359image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-30T01:27:51.810850image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
0.0276262
80.1%
1.068507
 
19.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

StoreType
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct4
Distinct (%)< 0.1%
Missing10743
Missing (%)3.0%
Memory size5.4 MiB
a
186008 
d
107285 
c
45533 
b
 
6085

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowb
2nd rowb
3rd rowb
4th rowb
5th rowa

Common Values

ValueCountFrequency (%)
a186008
52.3%
d107285
30.2%
c45533
 
12.8%
b6085
 
1.7%
(Missing)10743
 
3.0%

Length

2021-10-30T01:27:51.892419image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-30T01:27:52.124507image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
a186008
53.9%
d107285
31.1%
c45533
 
13.2%
b6085
 
1.8%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Assortment
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct3
Distinct (%)< 0.1%
Missing10743
Missing (%)3.0%
Memory size5.4 MiB
a
183012 
c
158619 
b
 
3280

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowb
2nd rowa
3rd rowb
4th rowa
5th rowc

Common Values

ValueCountFrequency (%)
a183012
51.5%
c158619
44.6%
b3280
 
0.9%
(Missing)10743
 
3.0%

Length

2021-10-30T01:27:52.220909image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-30T01:27:52.305652image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
a183012
53.1%
c158619
46.0%
b3280
 
1.0%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

CompetitionDistance
Real number (ℝ≥0)

MISSING

Distinct654
Distinct (%)0.2%
Missing11658
Missing (%)3.3%
Infinite0
Infinite (%)0.0%
Mean5431.833132
Minimum20
Maximum75860
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.4 MiB
2021-10-30T01:27:52.409779image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile140
Q1720
median2320
Q36890
95-th percentile20390
Maximum75860
Range75840
Interquartile range (IQR)6170

Descriptive statistics

Standard deviation7747.339838
Coefficient of variation (CV)1.42628458
Kurtosis13.44627106
Mean5431.833132
Median Absolute Deviation (MAD)1970
Skewness2.971624023
Sum1868528870
Variance60021274.56
MonotonicityNot monotonic
2021-10-30T01:27:52.566937image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2503701
 
1.0%
12002754
 
0.8%
502486
 
0.7%
3502463
 
0.7%
1902454
 
0.7%
1502220
 
0.6%
902205
 
0.6%
1802147
 
0.6%
3302144
 
0.6%
1101860
 
0.5%
Other values (644)319562
89.9%
(Missing)11658
 
3.3%
ValueCountFrequency (%)
20308
 
0.1%
301229
0.3%
401533
0.4%
502486
0.7%
60925
 
0.3%
701538
0.4%
80924
 
0.3%
902205
0.6%
1001546
0.4%
1101860
0.5%
ValueCountFrequency (%)
75860348
0.1%
58260351
0.1%
48330318
0.1%
46590316
0.1%
45740301
0.1%
44320311
0.1%
40860338
0.1%
40540313
0.1%
38710309
0.1%
38630346
0.1%

CompetitionOpenSinceMonth
Real number (ℝ≥0)

MISSING

Distinct12
Distinct (%)< 0.1%
Missing120379
Missing (%)33.8%
Infinite0
Infinite (%)0.0%
Mean7.228532568
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.4 MiB
2021-10-30T01:27:52.712292image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile2
Q14
median8
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.208218218
Coefficient of variation (CV)0.4438270406
Kurtosis-1.240989051
Mean7.228532568
Median Absolute Deviation (MAD)3
Skewness-0.1727582443
Sum1700693
Variance10.29266413
MonotonicityNot monotonic
2021-10-30T01:27:52.825601image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
938861
 
10.9%
429158
 
8.2%
1128421
 
8.0%
321525
 
6.1%
720583
 
5.8%
1219738
 
5.5%
1018955
 
5.3%
615543
 
4.4%
513522
 
3.8%
212628
 
3.6%
Other values (2)16341
 
4.6%
(Missing)120379
33.8%
ValueCountFrequency (%)
14319
 
1.2%
212628
 
3.6%
321525
6.1%
429158
8.2%
513522
 
3.8%
615543
 
4.4%
720583
5.8%
812022
 
3.4%
938861
10.9%
1018955
5.3%
ValueCountFrequency (%)
1219738
5.5%
1128421
8.0%
1018955
5.3%
938861
10.9%
812022
 
3.4%
720583
5.8%
615543
 
4.4%
513522
 
3.8%
429158
8.2%
321525
6.1%

CompetitionOpenSinceYear
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct23
Distinct (%)< 0.1%
Missing120379
Missing (%)33.8%
Infinite0
Infinite (%)0.0%
Mean2008.66883
Minimum1900
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.4 MiB
2021-10-30T01:27:52.947802image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1900
5-th percentile2001
Q12006
median2010
Q32013
95-th percentile2014
Maximum2015
Range115
Interquartile range (IQR)7

Descriptive statistics

Standard deviation6.200434621
Coefficient of variation (CV)0.003086837675
Kurtosis127.4467121
Mean2008.66883
Median Absolute Deviation (MAD)3
Skewness-8.013484848
Sum472589559
Variance38.44538948
MonotonicityNot monotonic
2021-10-30T01:27:53.098780image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
201325863
 
7.3%
201225353
 
7.1%
201421630
 
6.1%
200519093
 
5.4%
201017068
 
4.8%
201116887
 
4.7%
200816649
 
4.7%
200916610
 
4.7%
200714831
 
4.2%
200614473
 
4.1%
Other values (13)46818
 
13.2%
(Missing)120379
33.8%
ValueCountFrequency (%)
1900311
 
0.1%
1961313
 
0.1%
19901538
 
0.4%
1994606
 
0.2%
1995616
 
0.2%
1998310
 
0.1%
19992519
 
0.7%
20003095
 
0.9%
20014944
1.4%
20028391
2.4%
ValueCountFrequency (%)
201511547
3.2%
201421630
6.1%
201325863
7.3%
201225353
7.1%
201116887
4.7%
201017068
4.8%
200916610
4.7%
200816649
4.7%
200714831
4.2%
200614473
4.1%

Promo2
Categorical

HIGH CORRELATION
MISSING

Distinct2
Distinct (%)< 0.1%
Missing10743
Missing (%)3.0%
Memory size5.4 MiB
1.0
176218 
0.0
168693 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1.0
2nd row1.0
3rd row1.0
4th row0.0
5th row0.0

Common Values

ValueCountFrequency (%)
1.0176218
49.5%
0.0168693
47.4%
(Missing)10743
 
3.0%

Length

2021-10-30T01:27:53.231726image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-30T01:27:53.308828image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
1.0176218
51.1%
0.0168693
48.9%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Promo2SinceWeek
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct24
Distinct (%)< 0.1%
Missing179436
Missing (%)50.5%
Infinite0
Infinite (%)0.0%
Mean23.58940063
Minimum1
Maximum50
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.4 MiB
2021-10-30T01:27:53.386407image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q113
median22
Q337
95-th percentile45
Maximum50
Range49
Interquartile range (IQR)24

Descriptive statistics

Standard deviation14.13311863
Coefficient of variation (CV)0.5991300435
Kurtosis-1.387343646
Mean23.58940063
Median Absolute Deviation (MAD)13
Skewness0.07517852508
Sum4156877
Variance199.7450421
MonotonicityNot monotonic
2021-10-30T01:27:53.511999image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
1425104
 
7.1%
4023658
 
6.7%
3113658
 
3.8%
1012918
 
3.6%
512080
 
3.4%
3710806
 
3.0%
110795
 
3.0%
1310471
 
2.9%
4510439
 
2.9%
2210176
 
2.9%
Other values (14)36113
 
10.2%
(Missing)179436
50.5%
ValueCountFrequency (%)
110795
3.0%
512080
3.4%
6313
 
0.1%
94338
 
1.2%
1012918
3.6%
1310471
2.9%
1425104
7.1%
188901
 
2.5%
2210176
2.9%
231500
 
0.4%
ValueCountFrequency (%)
50312
 
0.1%
49304
 
0.1%
482861
 
0.8%
4510439
2.9%
44931
 
0.3%
4023658
6.7%
391846
 
0.5%
3710806
3.0%
363072
 
0.9%
357738
 
2.2%

Promo2SinceYear
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct7
Distinct (%)< 0.1%
Missing179436
Missing (%)50.5%
Infinite0
Infinite (%)0.0%
Mean2011.762856
Minimum2009
Maximum2015
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size5.4 MiB
2021-10-30T01:27:53.622220image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Quantile statistics

Minimum2009
5-th percentile2009
Q12011
median2012
Q32013
95-th percentile2014
Maximum2015
Range6
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.671574968
Coefficient of variation (CV)0.0008309006019
Kurtosis-1.059078889
Mean2011.762856
Median Absolute Deviation (MAD)1
Skewness-0.1187247114
Sum354508827
Variance2.794162874
MonotonicityNot monotonic
2021-10-30T01:27:53.725532image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
201139517
 
11.1%
201337063
 
10.4%
201429152
 
8.2%
201225162
 
7.1%
200922466
 
6.3%
201019763
 
5.6%
20153095
 
0.9%
(Missing)179436
50.5%
ValueCountFrequency (%)
200922466
6.3%
201019763
5.6%
201139517
11.1%
201225162
7.1%
201337063
10.4%
201429152
8.2%
20153095
 
0.9%
ValueCountFrequency (%)
20153095
 
0.9%
201429152
8.2%
201337063
10.4%
201225162
7.1%
201139517
11.1%
201019763
5.6%
200922466
6.3%

PromoInterval
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct3
Distinct (%)< 0.1%
Missing179436
Missing (%)50.5%
Memory size5.4 MiB
Jan,Apr,Jul,Oct
103464 
Feb,May,Aug,Nov
40088 
Mar,Jun,Sept,Dec
32666 

Length

Max length16
Median length15
Mean length15.18537266
Min length15

Characters and Unicode

Total characters0
Distinct characters0
Distinct categories0 ?
Distinct scripts0 ?
Distinct blocks0 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFeb,May,Aug,Nov
2nd rowJan,Apr,Jul,Oct
3rd rowMar,Jun,Sept,Dec
4th rowJan,Apr,Jul,Oct
5th rowFeb,May,Aug,Nov

Common Values

ValueCountFrequency (%)
Jan,Apr,Jul,Oct103464
29.1%
Feb,May,Aug,Nov40088
 
11.3%
Mar,Jun,Sept,Dec32666
 
9.2%
(Missing)179436
50.5%

Length

2021-10-30T01:27:53.855445image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Histogram of lengths of the category

Pie chart

2021-10-30T01:27:53.939953image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
ValueCountFrequency (%)
jan,apr,jul,oct103464
58.7%
feb,may,aug,nov40088
 
22.7%
mar,jun,sept,dec32666
 
18.5%

Most occurring characters

ValueCountFrequency (%)
No values found.

Most occurring categories

ValueCountFrequency (%)
No values found.

Most frequent character per category

Most occurring scripts

ValueCountFrequency (%)
No values found.

Most frequent character per script

Most occurring blocks

ValueCountFrequency (%)
No values found.

Most frequent character per block

Interactions

2021-10-30T01:27:45.047656image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:36.380119image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:37.784357image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:39.237353image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:40.798287image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:42.390288image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:43.814399image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:45.221625image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:36.632024image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:37.979442image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:39.469163image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:41.001404image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:42.598069image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:43.990817image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:45.403193image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:36.838171image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:38.185950image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:39.705846image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:41.206907image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:42.814009image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:44.170226image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:45.567842image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:37.031372image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:38.382578image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:39.932676image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:41.602037image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:43.031864image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:44.333318image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:45.734312image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:37.212995image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:38.567674image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:40.142998image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:41.807023image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:43.233475image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:44.489968image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:45.922794image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:37.393700image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:38.792140image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:40.352457image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:41.993455image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:43.423542image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:44.668815image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:46.095642image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:37.568245image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:38.995854image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:40.542692image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:42.173642image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:43.605862image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
2021-10-30T01:27:44.865521image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Correlations

2021-10-30T01:27:54.032551image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-10-30T01:27:54.269947image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-10-30T01:27:54.499896image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-10-30T01:27:54.846946image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2021-10-30T01:27:55.039758image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-10-30T01:27:46.570892image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
A simple visualization of nullity by column.
2021-10-30T01:27:47.327809image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-10-30T01:27:49.564630image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-10-30T01:27:50.038649image/svg+xmlMatplotlib v3.4.3, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

DateStoreDayOfWeekOpenPromoStateHolidaySchoolHolidayStoreTypeAssortmentCompetitionDistanceCompetitionOpenSinceMonthCompetitionOpenSinceYearPromo2Promo2SinceWeekPromo2SinceYearPromoInterval
02013-01-013532.01.00.0a1.0bb900.0NaNNaN1.014.02013.0Feb,May,Aug,Nov
12013-01-013352.01.00.0a1.0ba90.0NaNNaN1.031.02013.0Jan,Apr,Jul,Oct
22013-01-015122.01.00.0a1.0bb590.0NaNNaN1.05.02013.0Mar,Jun,Sept,Dec
32013-01-014942.01.00.0a1.0ba1260.06.02011.00.0NaNNaNNaN
42013-01-015302.01.00.0a1.0ac18160.0NaNNaN0.0NaNNaNNaN
52013-01-014232.01.00.0a1.0ba1270.05.02014.00.0NaNNaNNaN
62013-01-01852.01.0NaNa1.0ba1870.010.02011.00.0NaNNaNNaN
72013-01-012742.01.00.0a1.0bb3640.0NaNNaN1.010.02013.0Jan,Apr,Jul,Oct
82013-01-012622.01.00.0a1.0ba1180.05.02013.00.0NaNNaNNaN
92013-01-012592.01.00.0a1.0bb210.0NaNNaN0.0NaNNaNNaN

Last rows

DateStoreDayOfWeekOpenPromoStateHolidaySchoolHolidayStoreTypeAssortmentCompetitionDistanceCompetitionOpenSinceMonthCompetitionOpenSinceYearPromo2Promo2SinceWeekPromo2SinceYearPromoInterval
3556442014-01-317395.01.00.000.0dc2770.06.02008.01.022.02011.0Jan,Apr,Jul,Oct
3556452014-01-317405.01.0NaN0NaNda6400.03.02014.00.0NaNNaNNaN
3556462014-01-317415.01.00.001.0dc11900.0NaNNaN0.0NaNNaNNaN
3556472014-01-317435.01.00.000.0aa6710.011.02003.01.014.02012.0Jan,Apr,Jul,Oct
3556482014-01-317445.01.00.000.0aa1370.012.02011.01.040.02014.0Jan,Apr,Jul,Oct
3556492014-01-317455.01.00.000.0aa17650.011.02013.01.037.02009.0Jan,Apr,Jul,Oct
3556502014-01-317465.01.00.000.0dc4330.02.02011.01.035.02011.0Mar,Jun,Sept,Dec
3556512014-01-317475.01.00.000.0cc45740.08.02008.00.0NaNNaNNaN
3556522014-01-317655.01.00.000.0ac25430.05.01999.01.037.02009.0Jan,Apr,Jul,Oct
3556532014-01-317425.01.00.000.0dc4380.0NaNNaN0.0NaNNaNNaN